Day 12 - Regular expressions - Single characters
42
$ grep -E "d." examples.txt
dog
beholder
dryad
dog
aardvark
corn dog
direwolf
phase spider
undead red dragon
Spider-Man [*]
wild hog
Big Bad Wolf
As you can see grep highlights groups of two letters, all of them starting with d. The “do” in dog,
the “de” in spider, the “dr” in dragon, all these lines have one thing in common: they contain a d
followed by another symbol (a letter or a space, as happens for example in “wild hog”).
This is what the symbol . does in a regular expression. It doesn’t mean a full stop, like in the standard
punctuation usage, but “any character”. Whenever a regular expression contains a . there can be any
single character. Mind the fact that a single . matches a single character.
As you can see, then, regular expressions are simple strings, but they can contain either normal
characters (mostly letters of the alphabet, both lowercase and uppercase, and numbers) and special
ones. So far we learned about only one of the special characters, that is ., commonly called “dot” in
this context.
How do we match a proper dot in the string? Since regular expressions assign a special meaning to
some characters, when you want to use those characters for their original value you have to escape
them with a \ (backslash). So, while
$ grep -E "1.1" examples.txt
Police 101
HTTP/1.1
matches two characters “1” separated by any single character, the regular expression
$ grep -E "1\.1" examples.txt
HTTP/1.1
matches only those separated by a literal dot. Pay attention that the dot can be a punctuation mark,
a decimal point, or have any other meaning. Regular expressions don’t know anything about the
text that you are parsing, they just consider pure characters.
The other important tool that can use regular expressions is sed, that we already met twice in the
previous chapters. To activate them in sed you need to use the -r option, and this makes the search
pattern in an s/ command a regular expression.